Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sparse checkout and set list of service directories based upon the project list #29151

Merged
merged 6 commits into from
Jun 2, 2022

Conversation

JimSuplizio
Copy link
Member

There are a few things with this update:

  1. With these changes every Job will have two environment variables defined. SparseCheckoutDirectories, a JSON formatted list to be consumed by sparse-checkout and ServiceDirectories, a JSON formatted list of service directories.
  2. generate_from_source_pom.py was being used for its side effect which was to set an environment variable of JSON formatted partial paths to be used by sparse checkout. This script shouldn't be running except when the ClientFromSourcePom.xml is actually needed. Set the variable all the time without requiring a command line argument. Also added the setting of ServiceDirectories which contains a list of service directories generated from the artifacts list.
  3. Added eng\scripts\Generate-ServiceDirectories-From-Project-List.ps1 - Given a list of libraries, set SparseCheckoutDirectories and ServiceDirectories. Compared to generate_from_source_pom.py, which needs to climb through the messy tree of dependents, this script is pretty bare. It generates the environment variables strictly from the project list. The only non-standard thing in there is how unreleaed modules are treated; To avoid dealing with dependency trees, just add the service directories to the unreleased library dependencies to every sparse checkout. At worst, it'll add a couple of seconds onto a sync if there are any unreleased dependencies that month.

@JimSuplizio JimSuplizio added the EngSys This issue is impacting the engineering system. label May 31, 2022
@JimSuplizio JimSuplizio self-assigned this May 31, 2022
@JimSuplizio
Copy link
Member Author

/azp run java - code-quality-reports - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JimSuplizio
Copy link
Member Author

/azp run java - core - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@benbp benbp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some syntax nits that might help prevent bugs down the line. Otherwise LGTM.

@JimSuplizio
Copy link
Member Author

/azp run java - core - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JimSuplizio
Copy link
Member Author

/azp run java - code-quality-reports - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JimSuplizio
Copy link
Member Author

/azp run java - core - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JimSuplizio
Copy link
Member Author

The first java - core - ci I'd kicked off this morning after updating my forked repo had a hang in one of the tests that sat there for 38 minutes. This caused me to have to cancel that run and kick off another one.

@JimSuplizio JimSuplizio merged commit a5a1f9a into Azure:main Jun 2, 2022
@JimSuplizio JimSuplizio deleted the ServiceAndSparseCheckout branch June 2, 2022 16:42

- template: /eng/common/pipelines/templates/steps/sparse-checkout.yml
parameters:
Paths: $(CheckoutDirectories)
Paths: $(SparseCheckoutDirectories)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I was trying to figure out where this is set and until I reviewed the script it wasn't clear that is what set it. In other cases we have made that a parameter into the script so things were a little more connected. It looks like you are setting 2 variables which is probably why you opted not to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously they were passing this into the python script and calling it prior to every sparse checkout that wasn't the default. There's no real point in passing in an environment variable to set if it's always being set in multiple places. Further, I'd added a second variable for ServiceDirectires which isn't yet being used but will be so I saw no point in passing in variables that were being expected.

SkipDefaultCheckout: true

- ${{ parameters.PreBuildSteps }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intent to move the PreBuildSteps after the sparse checkout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional, otherwise there's a chicken or egg problem with the pre build scripts not being checked out with the initial sparse checkout pass.


$sparseCheckoutDirHash = @{}
$serviceDirHash = @{}
foreach($file in Get-ChildItem -Path $SourcesDirectory -Filter pom*.xml -Recurse -Depth 3 -File) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should consider using Get-java-PackageInfoFromRepo from language-settings.ps1 to try and start using a common function for gathering properties for the projects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshaggard, we'd have a chicken and egg there. That function requires the service directory. The common functions, for Java anyways, are far less common than one would think. Things under common aren't really common outside of the fact that they're specifically being used for a targeted piece of processing (like docs updating or .MD file verifications) and are mostly not useful outside of the context in which they were initially written.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that function should work without a servicedirectory as well. Mostly I was wanting to try and have a common function we can use to parse and produce the properties.


$temp = ConvertTo-Json @($sparseCheckoutDirectories | Sort-Object | Get-Unique) -Compress
Write-Host "setting env variable SparseCheckoutDirectories = $temp"
Write-Host "##vso[task.setvariable variable=SparseCheckoutDirectories;]$temp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the limit is but I suspect there is a limit on the size of a DevOps variable or a Env variable. So might be something we need to think about.

Copy link
Member Author

@JimSuplizio JimSuplizio Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshaggard - Previously the sparse checkout was granular down to the artifact's directory in some cases there were 120+ items added to the sparse checkout (like core's FromSource run for example). I changed things to just simply do the service directory which turned 120+ into like 30. If there's a limit it would have been hit long before I'd made my changes which dramatically trimmed things down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the docs for it right now, but I remember the limit being large enough that it hasn't been a concern thus far.

# output the SparseCheckoutDirectories environment variable
sparse_checkout_paths = list(sorted(sparse_checkout_directories))
print('setting env variable SparseCheckoutDirectories = {}'.format(sparse_checkout_paths))
print('##vso[task.setvariable variable=SparseCheckoutDirectories;]{}'.format(json.dumps(sparse_checkout_paths)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any good reason to do this in both scripts as opposed to just running both scripts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts generate dramatically different lists for the same list of artifacts. The generate_from_source_pom produces a list of sparse checkout directories that's necessary for the FromSource run which includes the service directory for the list and every service directory that has a dependency on any project in the project list. It also generates the ClientFromSourcePom.xml which is only needed in two places. Generate-ServiceDirectories-From-Project-List.ps1 only generates the list of service directories for the project list and nothing more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EngSys This issue is impacting the engineering system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants